Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes the ORCA parser to correctly extract scaling factors from optimization output and updates ORCA optimization keywords to use current syntax.
Key changes:
- Fixed ZPE parser to properly handle the 'Eh' unit token in ORCA output
- Added formatting functions to normalize method and basis set names for ORCA compatibility
- Updated DFT grid keywords from deprecated Grid5/Grid6 to defgrid2/defgrid3
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| arc/parser/adapters/orca.py | Fixed ZPE correction parser to correctly locate the energy value before the 'Eh' unit token |
| arc/job/adapters/orca.py | Added method/basis formatting helpers, updated grid keywords to current ORCA syntax, and integrated formatting into input file generation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _format_orca_method(method: str) -> str: | ||
| """ | ||
| Convert ARC method names to ORCA-friendly labels when needed. | ||
| """ | ||
| if not method: | ||
| return method | ||
| if method.lower() == 'wb97xd': | ||
| logger.warning('ORCA does not support wb97xd; use wb97x or wb97x-d3. ' | ||
| 'wb97xd3 will be normalized to wb97x-d3.') | ||
| return ORCA_METHOD_ALIASES.get(method.lower(), method) | ||
|
|
||
|
|
||
| def _format_orca_basis_token(token: str) -> str: | ||
| """ | ||
| Convert def2 basis tokens to ORCA formatting (e.g., def2tzvp -> def2-TZVP). | ||
| """ | ||
| if not token: | ||
| return token | ||
| parts = token.split('/') | ||
| base = parts[0] | ||
| if base.lower().startswith('def2'): | ||
| base_rest = base[4:] | ||
| if base_rest.startswith('-'): | ||
| base_rest = base_rest[1:] | ||
| if base_rest: | ||
| base = f"def2-{base_rest.lower()}" | ||
| if len(parts) > 1: | ||
| parts = [base] + [part.lower() for part in parts[1:]] | ||
| return '/'.join(parts) | ||
| return base | ||
|
|
||
|
|
||
| def _format_orca_basis(basis: str) -> str: | ||
| """ | ||
| Convert basis strings to ORCA-friendly labels where applicable. | ||
| """ | ||
| if not basis: | ||
| return basis | ||
| return ' '.join(_format_orca_basis_token(token) for token in basis.split()) |
There was a problem hiding this comment.
The new helper functions '_format_orca_method', '_format_orca_basis_token', and '_format_orca_basis' lack test coverage. Since there are existing test files for the ORCA adapter (arc/job/adapters/orca_test.py), tests should be added to verify these formatting functions work correctly with various inputs like 'wb97xd3', 'def2tzvp', 'def2-TZVP', 'def2tzvp/c', etc.
Normalizes method strings to ORCA-friendly labels and addresses the wb97xd deprecation by suggesting alternatives. Formats basis set strings to ORCA formatting (e.g., def2tzvp -> def2-TZVP). Fixed the opt and fine opt keywords to be more inline with 5.0.4 - 6.0.0
Adds unit tests for the ORCA method, basis set token, and basis set formatting helper functions. These tests ensure the correct conversion of basis set names and methods to the format expected by ORCA.
fae9bc3 to
92ecf41
Compare
Addresses an issue where the zero-point energy (ZPE) extraction fails when the "Eh" unit is explicitly present in the line. Improves the parsing logic to correctly identify and extract the ZPE value, regardless of whether the "Eh" unit is present or not.
When ORCA is used for optimisation, we attempt to get the scaling factor, however the parser was broken. It caused the scaling factor to return 0.002 at wb97x-d3. This PR should rectify it now to be:
Also, adjusted the optimisation keywords as they appear to be out of date.